Lecture 3 : KL - divergence and connections
نویسنده
چکیده
1 Recap Recall some important facts about entropy and mutual information from the previous lecture: • H(X,Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ) • I(X;Y ) = H(X)−H(X|Y ) = H(Y )−H(Y |X) = H(X) + H(Y )−H(X,Y ) • I(X;Y |Z) = H(X|Z)−H(X|Y,Z) • I(X;Y ) = 0 if X and Y are independent • I(X;Y ) ≥ 0 or, equivalently, H(X) ≥ H(X|Y ) Exercise 1.1 Prove that H(X|Y ) = 0 if and only if X = g(Y ) for some function g. 2 More mutual information 2.1 Mutual information chain rule We begin by proving the chain rule for mutual information. Theorem 2.1 (Chain rule for mutual information)
منابع مشابه
Adaptive Data Analysis
These lecture notes are based on [BNS+16] and were compiled for a guest lecture in the course CS229r “Information Theory in Computer Science” taught by Madhu Sudan at Harvard University in Spring 2016. Menu for today’s lecture: • Motivation • Model • Overfitting & comparison to non-adaptive data analysis • What can we do adaptively? • KL Divergence recap • Proof • Differential privacy (time per...
متن کامل3 Proof of Theorem 1 using the Primal - Dual method
In this section we show how the refined upper bound on the regret of the EXP algorithm proved using the potential function approach (KL divergence) also gives us a better bound for the expert game setup with bandit feedback. Last lecture we showed how in the case of expert prediction with bandit feedback using the Exp3 algorithm, the regret is upper bounded by T 2/3n1/3 using a rough upper boun...
متن کاملCs 674/info 630: Advanced Language Technologies
P~ θ : V 7→ [0, 1], where ~ θ is an element of the m-dimensional probability simplex. Hence the probability assigned to a single term vj is defined as: P~ θ (vj) def = θ[j]. Also recall from the previous lecture that the Kullback–Leibler (KL) divergence between two probability distributions P~ θ and P~ θ′ , i.e. the expected log-likelihood ratio with respect to P~ θ, is defined as: D(P~ θ ‖P~ θ...
متن کاملNotes on Kullback-Leibler Divergence and Likelihood
The Kullback-Leibler (KL) divergence is a fundamental equation of information theory that quantifies the proximity of two probability distributions. Although difficult to understand by examining the equation, an intuition and understanding of the KL divergence arises from its intimate relationship with likelihood theory. We discuss how KL divergence arises from likelihood theory in an attempt t...
متن کاملKullback-Leibler Divergence for Nonnegative Matrix Factorization
The I-divergence or unnormalized generalization of KullbackLeibler (KL) divergence is commonly used in Nonnegative Matrix Factorization (NMF). This divergence has the drawback that its gradients with respect to the factorizing matrices depend heavily on the scales of the matrices, and learning the scales in gradient-descent optimization may require many iterations. This is often handled by expl...
متن کامل